8 research outputs found

    Composable Deep Reinforcement Learning for Robotic Manipulation

    Full text link
    Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies. This compositionality provides an especially valuable tool for real-world manipulation, where constructing new policies by composing existing skills can provide a large gain in efficiency over training from scratch. Our experimental evaluation demonstrates that soft Q-learning is substantially more sample efficient than prior model-free deep reinforcement learning methods, and that compositionality can be performed for both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies

    Prognostic model to predict postoperative acute kidney injury in patients undergoing major gastrointestinal surgery based on a national prospective observational cohort study.

    Get PDF
    Background: Acute illness, existing co-morbidities and surgical stress response can all contribute to postoperative acute kidney injury (AKI) in patients undergoing major gastrointestinal surgery. The aim of this study was prospectively to develop a pragmatic prognostic model to stratify patients according to risk of developing AKI after major gastrointestinal surgery. Methods: This prospective multicentre cohort study included consecutive adults undergoing elective or emergency gastrointestinal resection, liver resection or stoma reversal in 2-week blocks over a continuous 3-month period. The primary outcome was the rate of AKI within 7 days of surgery. Bootstrap stability was used to select clinically plausible risk factors into the model. Internal model validation was carried out by bootstrap validation. Results: A total of 4544 patients were included across 173 centres in the UK and Ireland. The overall rate of AKI was 14路2 per cent (646 of 4544) and the 30-day mortality rate was 1路8 per cent (84 of 4544). Stage 1 AKI was significantly associated with 30-day mortality (unadjusted odds ratio 7路61, 95 per cent c.i. 4路49 to 12路90; P < 0路001), with increasing odds of death with each AKI stage. Six variables were selected for inclusion in the prognostic model: age, sex, ASA grade, preoperative estimated glomerular filtration rate, planned open surgery and preoperative use of either an angiotensin-converting enzyme inhibitor or an angiotensin receptor blocker. Internal validation demonstrated good model discrimination (c-statistic 0路65). Discussion: Following major gastrointestinal surgery, AKI occurred in one in seven patients. This preoperative prognostic model identified patients at high risk of postoperative AKI. Validation in an independent data set is required to ensure generalizability
    corecore